Offline Handwritten Arabic Character Segmentation with Probabilistic Model

نویسندگان

  • Pingping Xiu
  • Liangrui Peng
  • Xiaoqing Ding
  • Hua Wang
چکیده

The research on offline handwritten Arabic character recognition has received more and more attention in recent years, because of the increasing needs of Arabic document digitization. The variation in Arabic handwriting brings great difficulty in character segmentation and recognition, eg., the subparts (diacritics) of the Arabic character may shift away from the main part. In this paper, a new probabilistic segmentation model is proposed. First, a contour-based over-segmentation method is conducted, cutting the word image into graphemes. The graphemes are sorted into 3 queues, which are character main parts, sub-parts (diacritics) above or below main parts respectively. The confidence for each character is calculated by the probabilistic model, taking recognizer output, geometric confidence and logical constraint into consideration. Then, the global optimization is conducted to find optimal cutting path, taking weighted average of character confidences as objective function. Experiments on handwritten Arabic documents with various writing styles show the proposed method is effective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

An Experimental Approach for Recognizing Handwritten Arabic Words*

This paper discusses the process of implementing an off-line system for recognizing handwritten Arabic words. In order to recognize a word, its character decomposition should be known. This is done through segmentation. In our model, Arabic character recognition goes through a preprocessing stage followed by a recognition stage. Each character of the word is investigated in order to determine i...

متن کامل

Offline Handwritten Arabic Word Recognition Using HMM - a Character Based Approach without Explicit Segmentation

This paper presents the IfN’s Offline Handwritten Arabic Word Recognition System. The system uses Hidden Markov Models (HMM) for word recognition, and is based on character recognition without explicit segmentation. The first part of this paper deals with databases for word recognition systems, and in particular, the IFN/ENIT database. The second part gives a short description of the pre-proces...

متن کامل

A Survey on Arabic Character Recognition

Off-line recognition of text play a significant role in several application such as the automatic sorting of postal mail or editing old documents. It is the ability of the computer to distinguish characters and words. Automatic off-line recognition of text can be divided into the recognition of printed and handwritten characters. Off-line Arabic handwriting recognition still faces great challen...

متن کامل

Recognising handwritten Arabic manuscripts using a single hidden Markov model

This paper presents a new method on off-line recognition of handwritten Arabic script. The method does not require segmentation into characters, and is applied to cursive Arabic script, where ligatures, overlaps and style variation pose challenges to the recognition system. The method trains a single hidden Markov model (HMM) with the structural features extracted from the manuscript words. The...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006